Robust Speech Recognition Based on Localized Spectro-temporal Features
نویسنده
چکیده
In order to enhance automatic speech recognition performance in adverse conditions, localized spectro-temporal features (LSTF) are investigated, which are motivated by physiological measurements in the primary auditory cortex. In the Aurora2 experimental setup, Gabor-shaped LSTFs combined with a Tandem system yield robust performance with a feature set size of 30. If computational constraints allow, the set size may be increased with some beneficial effect to up to 70 features. There is supportive evidence that the previously chosen 1.5 periods within the envelope yield are a reasonable choice. Improved results can be obtained when using a Hanning window instead of a cut-off Gaussian envelope due to better modulation frequency characteristics. Combined spectro-temporal modulations filters play an important role in characterizing speech as more than 40% of all automatically selected features exhibit diagonal characteristics.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal modulations for robust speech emotion recognition
Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is in...
متن کاملMulti-stream spectro-temporal features for robust speech recognition
A multi-stream approach to utilizing the inherently large number of spectro-temporal features for speech recognition is investigated in this study. Instead of reducing the featurespace dimension, this method divides the features into streams so that each represents a patch of information in the spectrotemporal response field. When used in combination with MFCCs for speech recognition under both...
متن کاملSeparable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.
To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor...
متن کامل